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Abstract 

A  planning  system  which  has  a  fixed  method  will 
have  trouble  performing  efficiently  over  a  wide 
range  of  problems.  This  paper  provides  an  al¬ 
ternative  approach,  called  multi-method  planning, 
which  can  potentially  achieve  planner  complete¬ 
ness,  planning  time  efficiency,  and  plan  length  re¬ 
duction  at  the  same  time.  A  way  to  construct 
multi-method  planners  from  a  set  of  single-method 
planners  is  introduced,  and  the  constructed  plan¬ 
ners  are  compared  with  single-method  planners. 
Analytical  and  experimental  results  indicate  the 
potential  of  this  approach. 

Introduction 

Research  in  domain-independent  planning  has  been  a 
main  stream  in  the  area  of  AI  planning.  In  the  design 
of  domain-independent  planning  systems,  it  is  impor¬ 
tant  to  consider  the  following  criteria;  (1)  the  ability 
to  find  a  plan,  or  an  optimal  plan,  for  any  problem  in 
an  arbitrary  domain;  (2)  the  amount  of  time  required 
to  find  the  plan;  and  (3)  the  execution  cost  of  the  plan 
itself.  The  key  issue  here  is  how  to  construct  a  sin¬ 
gle  planning  method,  or  a  coordinated  set  of  different 
planning  methods,  that  has  sufficient  scope  and  effi¬ 
ciency. 

Most  planning  systems  encode  planning  behaviors 
within  a  fixed  planning  method  such  as  linear  plan¬ 
ning,  nonlinear  planning,  abstraction,  and  so  on.  Our 
hypothesis  is  that  no  single  method  will  satisfy  the 
above  three  criteria  for  all  situations.  For  example, 
STRIPS-type  linear  planners  are  based  on  the  linear¬ 
ity  assumption  to  reduce  the  number  of  goal  conjuncts 
to  consider  for  each  planning  step.  However,  the  as¬ 
sumption  makes  the  planners  unable  to  generate  an 
optimal  plan  for  certain  problems  in  domains  like  the 
blocks  world  (Sussman,  1975)  and  fail  to  find  a  plan 
in  domains  with  irreversible  operators  (Veloso,  1989). 
On  the  other  hand,  nonlinear  planners  which  are  free 
from  the  linearity  assumption  may  need  more  effort  to 
find  a  plan,  because  they  have  more  choices  to  consider 
at  each  planning  step.^ 

‘This  work  was  spcysored  by  the  Defense  Advanced  Re¬ 
search  Projects  Agency  (OOD)  and  the  Office  of  Naval  Re¬ 
search  under  contract  nnmber  N00014-89-K-0155. 

'The  teijn  '‘nonlinear”  in  this  context  implies  that  op¬ 
erators  in  service  of  different  goal  conjuncts  can  be  inter- 


An  alternative  approach  is  to  construct  a  multi- 
method  planner  which  consists  of  a  sequence  of  single¬ 
method  planners,  where  each  single-method  planner 
has  a  different  set  of  constraints.  One  way  to  deter¬ 
mine  the  sequence  of  planners  is  to  start  with  the  most 
restricted  planner  and  to  progress  on  to  less  restricted 
planners,  as  the  current  one  fails,  until  a  solution  is 
found.  The  idea  is  that  if  the  constraints  used  in  re¬ 
stricted  planners  can  prune  the  search  space  and  the 
eliminated  space  contains  inefficient  plans,  the  prob¬ 
lems  solvable  by  more  restricted  planners  should  be 
solved  more  quickly,  generating  efficient  plans,  while 
problems  requiring  less  restricted  planners  should  not 
waste  too  much  extra  time  trying  out  the  insufficient 
early  planners.  This  approach  is  inspired  by  iterative 
deepening  (Korf,  1985).  In  iterative  deepening,  a  se¬ 
quence  of  depth-first  searches  are  performed,  each  to 
a  greater  depth  than  the  previous  one.  If  a  solution 
is  found  at  a  shallow  depth,  the  cost  of  searching  to 
a  greater  depth  is  saved.  If  a  solution  is  not  found  at 
a  particular  depth,  a  deeper  search  is  performed.  The 
cost  of  doing  the  sha'iower  searches  is  then  wasted,  but 
since  the  deeper  search  costs  at  least  B  times  the  cost 
of  the  shallower  search  —  where  B  is  the  branching 
factor  of  the  search  tree  —  this  cost  can  be  relatively 
quite  small.  Thus,  if  the  proportion  of  problems  solv¬ 
able  at  shallow  depths  is  large  enough,  and  the  ratio  of 
costs  for  successive  levels  is  large  enough,  there  should 
be  a  net  gain. 

This  paper  describes  an  approach  to  building  ef¬ 
ficient  multi-method  planners  from  a  set  of  single¬ 
method  planners  and  evaluates  the  performance  of 
them  analytically  and  experimentally.  We  imple¬ 
mented  both  the  single-method  planners  and  the  multi¬ 
method  planners  in  the  context  of  the  Soar  architec¬ 
ture  (Laird,  Newell,  &  Rosenbloom,  1987).  Soar  is  a 
useful  vehicle  for  this  work  because  its  impasse-driven 
subgoaling  scheme  provides  the  necessary  context  for 
planning  and  its  multiple  problem-space  scheme  pro¬ 
vides  a  framework  for  multi-method  planning.  To  sim¬ 
plify  the  analysis,  we  focus  on  plans  represented  by 
STRIPS-like  operators. 


leaved.  It  does  not  necessarily  mean  that  partially  ordered 
plans  are  used. 


Single-method  planners 

A  planner  can  be  characterized  by  a  combination  of  a 
set  of  biases  over  the  space  of  plans  considered  and  a  set 
of  control  strategies  that  determine  which  plans  should 
be  considered  before  the  others  within  the  space.  In 
this  section,  we  introduce  six  single-method  planners 
which  are  defined  by  different  combinations  of  biases, 
as  implemented  in  Soar,  and  evaluate  them  experimen¬ 
tally  in  terms  of  planner  completeness,  planning  effi¬ 
ciency,  and  plan  length.  We  do  not  focus  on  control 
strategy  here. 

Planning  biases 

With  the  view  of  planning  as  search  over  a  plan  space, 
which  consists  of  all  possible  sequences  of  operators 
that  lead  to  the  goal  state,  bias  is  defined  as  a  con¬ 
straint  over  the  space  of  plans  considered  (Rosenbloom, 
Lee,  fc  Unruh,  1992).  To  be  specific,  bias  is  “any  ba¬ 
sis  for  choosing  one  plan  over  another  other  than  the 
goal  test”.  Together  with  the  planner’s  input  —  the 
combination  of  an  initial  state  and  a  goal  —  bias  thus 
determines  which  portion  of  the  entire  plan  space  can 
or  will  be  the  output  of  planning.  Thus,  the  combina¬ 
tion  of  biases  used  in  a  planner  characterizes  the  plans 
that  can  be  generated. 

The  planning  biases  that  we  have  concentrated  on 
currently  are  protection,  linearity  —  two  common  bi¬ 
ases  in  planning  —  and  directness.  A  protection  bias 
eliminates  all  plans  in  which  an  operator  undoes  an 
initial  goal  conjunct  that  is  either  true  a  priori  or  es¬ 
tablished  by  an  earlier  operator  in  the  sequence.^  A 
linearity  bias  removes  from  the  plan  space  all  plans  in 
which  operators  in  service  of  different  unachieved  goal 
conjuncts  occur  in  succession;  that  is,  once  an  oper¬ 
ator  for  one  unachieved  goal  conjunct  is  in  the  plan, 
operators  for  other  conjuncts  can  be  placed  only  after 
the  sequence  of  operators  for  the  first  goal  conjunct 
(Fikes,  Hart,  &  Nilsson,  1971).  A  directness  bias  elim¬ 
inates  all  plans  in  which  there  is  at  least  one  operator 
that  does  not  directly  achieve  a  goal  conjunct  included 
in  the  problem  definition. 

These  three  biases  have  all  been  implemented  as  op¬ 
tions  within  a  hybrid  planning  system.  The  imple¬ 
mented  system  consists  of  six  single-method  planners 
defined  by  the  cross-product  of  two  bias  dimensions  — 
goal  flexibility  and  goal  protection.  Figure  1  charac¬ 
terizes  the  3x2  set  of  planning  methods  derived  from 
these  bias  dimensions.  The  goal-flexibility  dimension 
is  shown  along  the  top  row  of  the  figure.  It  determines 
the  degree  of  flexibility  the  planner  has  in  generating 
new  subgoals  and  in  shifting  the  focus  in  the  goal  hi¬ 
erarchy.  This  dimension  subsumes  the  directness  and 
linearity  biases.  The  most  restricted  point  along  this 
dimension  allows  no  generation  of  new  subgoals  for  pre¬ 
condition  violation.  That  is,  if  an  operator  has  unmet 
preconditions  when  the  attempt  is  made  to  incorporate 
it  into  a  plan,  that  operator  is  rejected.  This  imple¬ 
ments  a  directness  bias  by  ensuring  that  each  of  the 
operators  in  a  plan  directly  achieves  an  initial  goal  con- 

^Other  forms  of  protection  can  be  found  in  (Sussman, 
1975;  Warren,  1974;  Waldinger,  1975). 


Figure  1:  The  planning  methods  generated  by  the 
bias  dimensions  (the  bottom-left  cell  represents  an  ex¬ 
tended  blocks  world  problem  where  a  block  that  is  not 
clear  can  be  moved,  dropping  all  the  blocks  above  it 
onto  the  table). 


junct,  rather  than  an  unmet  precondition  of  another 
operator. 

The  second  point  along  the  flexibility  dimension  al¬ 
lows  generation  of  new  subgoals,  but  only  a  single  lo¬ 
cal  set  of  conjuncts  are  attended  to  at  any  point  in 
time.  Initially  the  local  set  consists  of  the  conjuncts 
in  the  problem  specification.  However,  whenever  a  se¬ 
lected  operator  has  one  or  more  unmet  preconditions, 
the  previous  local  set  is  pushed  on  a  stack,  and  the 
operator’s  unmet  preconditions  become  the  new  local 
set.  When  the  operator’s  conditions  are  satisfied,  the 
stack  is  popped  to  return  to  the  previous  set.  Under 
the  assumption  that  an  operator  achieves  only  one  goal 
conjunct,  this  implements  linear  planning  by  restrict¬ 
ing  the  placement  of  an  operator  within  the  context  of 
the  local  conjuncts  from  which  it  arose,  thus  ensuring 
that  operators  for  different  goal  conjuncts  cannot  be 
interleaved  in  the  output  plans. 

The  third  point  along  the  flexibility  dimension  allows 
the  global  use  of  subgoals;  that  is,  new  goal  conjuncts 
are  generated  for  unmet  preconditions,  and  operators 
are  simultaneously  considered  for  all  unsatisfied  con¬ 
juncts.  This  is  the  least  restricted  version,  and  im¬ 
plements  nonlinear  planning  by  allowing  operators  for 
different  goal  conjuncts  to  be  interleaved. 

The  goal-protection  (GP)  dimension  is  shown  along 
the  left  side  of  Figure  1 .  The  two  points  implemented 
along  this  dimension  correspond  to  goal  protection  — 
that  is,  every  achieved  top-level  goal  conjunct  is  pro¬ 
tected  until  the  problem  is  solved  —  and  to  no  goal 
protection.  A  goal  protection  bias  shrinks  the  search 
space  by  cutting  off  sequences  of  operators  which  vio¬ 
late  goal  protection. 

Each  of  the  cells  in  Figure  1  shows  a  label  repre¬ 
senting  the  planner  for  that  cell  along  with  a  prob¬ 
lem  that  is  just  hard  enough  to  require  that  planner. 
The  most  restricted  planner  (Mi)  —  a  direct  goal- 
protection  planner  —  is  in  the  top-left  cell  of  the  fig- 


ure.  While  quite  restrictive,  it  is  sufficient  to  solve  the 
block-stacking  problem  shown  in  that  cell  of  the  figure. 
The  least  restricted  planner  (Me)  —  a  nonlinear  plan¬ 
ner  without  goal  protection  —  is  in  the  bottom-right 
cell  of  the  figure.  It  is  the  only  planner  in  the  figure 
capable  of  generating  an  optimal  solution  to  the  blocks- 
world  problem  shown  in  that  cell.  Between  these  two 
extremes,  moving  up  or  to  the  left  yields  more  bias, 
while  moving  down  or  to  the  right  yields  less  bias. 

Implementation  in  Soar 

The  hybrid  planning  system  containing  the  six  single¬ 
method  planners  has  been  implemented  in  the  context 
of  the  Soar  architecture  (Laird,  Newell,  &  Rosenbloom, 
1987).  Problem  solving  in  Soar  is  driven  by  applying 
operators  to  states  within  a  problem  space  to  achieve  a 
goal.  Knowledge  is  stored  in  a  permanent  recognition 
memory  and  a  temporary  working  memory.  Recog¬ 
nition  memory  consists  of  a  set  of  variabilized  rules, 
where  the  conditions  of  each  rule  are  matched  against 
working  memory  and  the  actions  of  a  matched  rule 
are  instantiated  to  propose  preferences  that  change  the 
working  memory.  The  most  typical  preferences  are 
feasibility  (acceptable,  reject)  and  desirability  (best, 
better,  indifferent,  worse,  worst)  preferences.  These 
preferences  are  held  in  preference  memory  and  used  by 
a  decision  procedure  to  determine  what  changes  are 
made  to  working  memory.  A  subgoal  is  created  when 
an  impasse  arises  in  the  decision  procedure.  As  the  re¬ 
sult  of  the  subgoal,  new  preferences  are  generated  and 
new  rules  are  learned  (via  a  chunking  process)  whose 
actions  are  based  on  the  working-memory  elements 
that  are  the  results  of  the  subgoal,  and  whose  condi¬ 
tions  are  based  on  the  working-memory  elements  that 
led  to  the  results  (Rosenbloom  &  Newell,  1986).  In  ef¬ 
fect,  chunking  is  much  like  explanation-based  learning 
(Rosenbloom  &  Laird,  1986). 

Figure  2  illustrates  initial  traces  of  particular  ver¬ 
sions  of  the  single-method  planners  as  implemented  in 
Soar  for  Sussman’s  anomaly  in  the  blocks  world.  It 
starts  with  a  combination  of  the  initial  state  and  the 
entire  conjunctive  goal  —  (And  (On  B  C)  (On  A  B)). 
By  means-ends  analysis,  it  generates  the  set  of  can¬ 
didate  operators  —  (move  B  C)  and  (move  A  B)  — 
that  are  known  to  potentially  be  able  to  achieve  any  of 
the  goal  conjuncts.  A  tie  impasse  then  occurs  unless 
there  is  information  about  how  to  pick  among  them.^ 
In  this  tie  impasse,  a  look-ahead  search  begins  by  se¬ 
lecting  one  of  the  alternatives  to  evaluate  —  here  it 
is  (move  A  B).  Its  preconditions  are  tested  and  if  it 
is  known  to  be  applicable,  it  is  executed.  If  it  is  not 
known  to  be  applicable,  what  happens  next  depends 
on  which  biases  are  used  in  the  method. 

If  the  directness  bias  is  used,  as  in  Figure  2(a),  the 
evaluation  of  (move  A  B)  is  terminated  immediatedly, 
with  failure  as  the  evaluation  value,  and  the  other  op¬ 
erator  (move  B  C)  is  selected.  If  the  directness  bias 
is  not  used,  a  new  set  of  goal  conjuncts  are  generated 
from  the  operator’s  unmet  preconditions  (Figure  2(b- 

^For  simplicity  of  presentation,  these  traces  only  show 
tie  impasses.  Refer  to  (Rosenbloom,  Lee,  &  Unrnh,  1990) 
for  other  types  of  impasses  in  planning. 


(b)  Linear  &  protection  (Mj) 


(c)  Nonlinear  &  protection  (M^ 


(d)  Linear  A.  no  protection  (M$) 


Figure  2:  Planning  in  Soar. 


d)).  The  difference  between  linear  and  nonlinear  plan¬ 
ning,  at  least  for  these  versions,  is  in  the  focus  of  op¬ 
erator  generation  from  the  new  goal  hierarchy.  Linear 
planning  shifts  focus  completely  to  the  new  conjunct 
—  (Clear  A)  as  in  Figure  2(b)  and  (d)  —  stays  with 
it  until  it  is  achieved,  and  then  pops  back  to  the  origi¬ 
nal  conjunct  that  led  to  the  impasse.  Processing  shifts 
to  one  of  its  siblings  (if  there  are  any)  only  after  the 
original  conjunct  is  achieved.  This  eventually  leads  to 
failure  if  a  protection  bias  is  used  (Figure  2(b)),  or 
generates  a  non-optimal  plan  if  a  protection  bias  is  not 
used  (Figure  2(d)).  Nonlinear  planning  instead  shifts 
to  an  expanded  set  of  conjuncts  that  includes  the  new 
set  plus  the  original  set  minus  the  conjunct  that  led  to 
the  impasse,  yielding  (On  B  C)  and  (Clear  A)  in  this 
example  (Figure  2(c)).  At  any  point  in  time,  an  opera¬ 
tor  can  be  selected  for  any  of  these  conjuncts,  enabling 
operator  sequences  to  be  interleaved  as  necessary.  For 
ail  of  the  above  planning  methods  (except  Figure  2(a)), 
once  the  new  focus  has  been  determined,  planning  con¬ 
tinues  recursively  by  using  means-ends  analysis  to  gen¬ 
erate  candidate  operators  from  the  new  goal  hierarchy. 

Experimental  performance  of  the 
single-method  planners 

Experimental  results  from  the  six  planners  are  shown 
in  Table  1.  This  data  comes  from  running  each  plan- 


ner  on  the  same  set  of  100  problems.  Because  random 
choices  are  made  among  the  goal  conjuncts  and  among 
the  operators  proposed  for  evaluation,  three  triaJs  are 
made  for  each  problem  and  the  results  are  averaged. 
For  each  problem,  an  initial  state  was  randomly  gen¬ 
erated  containing  three  or  four  blocks.  Likewise  a  set 
of  goal  conjuncts  was  randomly  generated  that  num¬ 
bered  between  two  and  the  number  of  blocks  in  the 
initial  state.  Learning  was  turned  on  for  each  prob¬ 
lem,  but  only  within-trial  transfer  was  allowed;  that 
is,  rules  learned  during  one  problem  were  not  used 
for  other  problems.  This  learning  essentially  enables 
dependency-directed  backtracking  and  transfer  across 
decisions  for  a  single  problem. 

Table  1(a)  shows  the  number  of  problems  solvable 
in  principle  by  each  cell’s  planner,  plus  a  label  for 
the  problem  set  that  this  implicitly  defines.  Not  sur¬ 
prisingly,  this  shows  a  monotonic  relationship  between 
planner  bias  and  scope,  from  a  low  of  68  problems  for 
the  most  restricted  planner  to  a  high  of  100  problems 
for  the  least  restricted  planner.  Tables  1(b)  and  1(c) 
show  the  average  number  of  decisions  and  the  average 
plan  lengths  —  which  should  positively  correlate,  re¬ 
spectively,  with  planning  time  and  execution  time  — 
for  each  of  the  four  problem  sets  defined  in  Table  1(a). 
These  four  problem  sets  are  associated  with  the  four 
rows  within  each  cell.  The  averages  in  each  cell  only 
include  the  data  from  the  trials  that  were  solved  within 
an  a  priori  limit  of  300  decisions.  Since  99%  of  the  solv¬ 
able  problems  were  actually  solved  within  this  limit, 
this  includes  nearly  all  of  the  trials. 

The  timing  results  in  Table  1(b)  show  that  planning 
effort  is  also  a  monotonically  decreasing  function  of  the 
amount  of  bias  along  these  dimensions  (though  there 
is  one  reversal  when  going  from  M2  to  M3  for  problem 
set  .42).  For  example,  for  problem  set  Ai,  effort  ranged 
from  a  low  of  16.3  decisions  for  the  most  biased  method 
to  a  high  of  39.0  decisions  for  the  least  biased  method. 
This  trade  off  between  efficiency  and  completeness  im¬ 
plies  that  selecting  an  appropriate  amount  of  bias  for  a 
given  problem  is  critical  for  finding  a  solution  quickly. 

Table  1(c)  exhibits  a  monotonic  relationship  between 
plan  length  and  the  amount  of  bias  used,  but  only  for 
directness  and  protection.  It  shows  that  the  linearity 
bias  does  not  help  here  in  generating  shorter  plans. 
The  most  likely  cause  of  the  reversal  for  the  linearity 
bias  is  that  this  bias  is  weak  enough  to  allow  solutions 
to  be  found  for  all  blocks  world  problems,  but  strong 
enough  to  eliminate  optimal  solutions  for  some  of  the 
problems;  for  example,  when  the  shortest  plan  requires 
operators  in  service  of  different  goal  conjuncts  to  be 
considered  before  the  current  goal  copjunct  is  achieved. 

Multi-method  planners 

One  of  the  main  problems  with  the  planners  examined 
in  the  previous  section  is  that  each  is  either  incom¬ 
plete  or  performs  a  significant  amount  of  excess  work 

*In  the  standard  blocks  world  domain,  Mi  is  the  same 
method  as  ‘M4.  Although  Mg  is  a  different  method  from 
Ms,  and  may  not  be  able  to  generate  an  optimal  plan  for 
this  domain,  both  M$  and  Me  are  complete  planners  for 
this  blocks  world  domain,  yielding  Ag  =  As- 
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(a)  Number  of  problems  solvable  in  principle. 


(Directness) _ (Linear) _ (Nonlinear) 


Ml 

M2  ; 

M3 

68  (Ai) 
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95  (Aj) 

96  (A3) 
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100  (Ag) 
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(Directness) 
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(Linear) 
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(Nonlinear) 

Ml 

M2 

M3 

.4i(A4) 

16.3 

18.7 

19  5 

GP  A2 

- 

28  0 

27  4 

- 

- 

27.8 

>»S(>16) 

- 

- 

- 

M4 

Mi 

Mi 

Ai(A4) 

16.3 

30.5 

39.0 

No  GP  A2 

- 

39.1 

49.9 

A3 

- 

40.3 

51.3 

Ag(Ag) 

- 

39.9 

52.6 

(b)  Average  number  of  decisions  per  problem  solved. 
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(Directness) 
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Global 

(Nonlinear) 

Mj 

M2 
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1-82 

1.84 

1  89 

GP  A2 

- 

2.41 

2.37 

^3 

- 

- 

2.39 

Ag(Ag) 

- 

- 

. 

M4 

Mi 

Me 

^1(^4) 

1.82 

3.61 

3.32 

No  GP  A2 

• 

4.41 

4.14 

A3 

- 

4.57 

4.17 

Ag(Ag) 

- 

4.57 

4.36 

(c)  Average  plan  length  per  problem  solved. 


Table  1:  Results  from  the  six  planners  on  three  inde¬ 
pendent  repetitions  of  a  hundred  randomly  generated 
blocks  world  problems. 


for  some  of  the  problems  (both  in  planning  and  exe¬ 
cution).  An  alternative  approach  is  to  build  a  multi- 
method  planner  which  has  a  coordinated  set  of  plan¬ 
ning  methods,  where  each  individual  planning  method 
has  a  different  set  of  constraints.  A  multi-method  plan¬ 
ner  can  be  coordinated  in  either  a  sequential  or  a  time- 
shared  manner.  A  sequential  multi-method  planner 
consists  of  a  sequence  of  single-method  planners,  while 
a  time-shared  multi-method  planner  consists  of  a  set 
of  single-method  planners,  where  each  method  is  active 
in  turn  for  a  given  time  slice  (Barley,  1991).® 

In  this  paper,  sequential  multi-method  planning  is 
investigated  and  evaluated  both  analytically  and  em¬ 
pirically.  To  simplify  the  analysis  we  focus  on  a  spe¬ 
cial  type  of  sequential  multi-method  planners  called 

^Another  type  of  coordination  for  a  multi- method  plan¬ 
ner  would  be  to  run  the  methods  in  parallel,  as  in  multi¬ 
agent  planning;  however,  the  focus  here  is  on  the  control  of 
a  single  serial  agent  only. 


F'igure  3:  A  restricted  dominance  graph  for  the  single¬ 
method  planners. 

a  monotontc  multi-method  planner,  where  the  single¬ 
method  planners  are  ordered  according  to  increasing 
coverage  and  decrerising  efficiency.  That  means,  plan¬ 
ning  starts  with  the  most  restricted  planner,  and  then 
successively  relaxes  the  restrictions  until  a  method  is 
found  that  is  sufficient  for  the  problem.  We  first 
present  a  scheme  to  construct  monotonic  multi-method 
planners  from  a  set  of  single-method  planners,  and 
then  provide  a  formal  model  to  compare  the  perfor¬ 
mance  of  constructed  monotonic  multi-method  plan¬ 
ners  with  single-method  planners.  Experimental  re¬ 
sults  for  them  follow  at  the  end  of  the  section. 

Constructing  monotonic  multi-method 
planners 

Let  Mjt,  be  a  single-method  planner,  which  can  range 
over  Ml  to  Mq  defined  in  the  previous  section.  A  se¬ 
quential  multi-method  planner  which  consists  of  n  dif¬ 
ferent  single-method  planners  is  denoted  as  Af*,  — *• 
Mjtj  Mk„  ■  Let  A  be  a  sample  set  of  problems, 

and  let  Ak,  C  A  be  the  subset  of  problems  which  are 
solvable  in  principle  by  M*,.  The  functions  s(Mt,,  Aj) 
and  l(Mk,,Aa)  represent  respectively  the  average  cost 
that  Mk,  requires  to  succeed  and  the  average  length  of 
plans  generated  by  Mk,,  for  the  problems  in  A,  C  A*,. 
Similarly,  f{Mk,,A/)  represents  the  average  wasted 
cost  for  Mk,  to  fail  for  the  problems  in  A/  C  A  —  A*,. 

A  restricted  dominance  relation  M*  -<  My  is  defined 
between  two  different  single-method  planners,  M*  and 
My,  if  the  following  conditions  hold: 

Ax  C  Ay  , 

s(Mx,Ak,)  <  s(My,Ak,),  for  every  Ak,  C  A^  and 
l(Mx,Ak,)  <  l(My,Ak,),  for  every  Ak^  C  A*. 

A  sequential  multi- method  planner  M*,  — ►  Af*,  — ► 
...  — »  Mk„  is  called  monotonic  if  Mk,  ■<  Mk,^.i  holds 
for  each  i=l,...,n-l.  Figure  3  exhibits  a  restricted 
dominance  graph  for  the  single-method  planners.  Mi, 
Ml,  A/s,  A/s,  and  A/e,  which  are  defined  in  the  pre¬ 
vious  section.  Each  node  in  the  graph  represents  a 
single-method  planner,  and  arc  from  A/*  to  My  im¬ 
plies  that  Mx  ■<  My  holds.  Thus  every  path  in  the 
graph  corresponds  to  a  monotonic  multi-method  plan¬ 
ner.  In  this  example,  eight  2-method  planners  and  four 
3-method  planners  can  be  constructed. 


Performance  analysis 

For  each  monotonic  multi-method  planner  Mk,  — 
A/tj  — »  ..  — *  Mk„,  there  is  a  corresponding  single¬ 
method  planner  Mk„  which  h2is  the  same  coverage  of 
solvable  problems.  If  A/*,  — *  Mk^  — *■  ...  — <■  Mk„  is  com¬ 
plete,  Mk„  is  also  complete.  We  compare  a  complete 
monotonic  multi-method  planner  with  its  correspond¬ 
ing  single-method  planner  in  terms  of  planning  time 
and  plan  length. 

The  probability  that  an  arbitrary  problem  in  A  is 
solvable  by  Mk,,  which  is  equivalent  to  Ak.fA,  is  de¬ 
noted  as  Pk,.  Let  Mko  be  a  null  planner  which  cannot 
solve  any  problem;  that  means  Ako  —  4>  and  Pk^  =  0. 
Let  Bk,  =  At,  —  At,_i ,  ^nr  1  <  i  <  n,  be  the  set  of 
problems  which  are  solvable  by  Mk,  but  not  by  Mk,., , 
and  let  Qt.  =  Pk,  —  Pk,-, ,  for  1  <  i  <  n. 

Planning  time:  The  expected  planning  time  se 
of  a  complete  monotonic  multi-method  planner  Mk,  — ► 
Mk,^  Mk„  can  be  represented  as 

SE{Mk,  — *■  Mtj  -*  ...  — ‘  Mk„,A)  = 

n  t  —  1 

*  (s{Mk„Bk,)  +  52  .  Bk,))],  (1) 

.=l  ;=1 

The  performance  of  the  corresponding  single-method 
planner  Mk„  is  s(A/i„,A),  which  can  be  rewritten  as 
the  sum  of  the  average  planning  time  for  the  disjoint 
problem  sets  At,  —  Ai,_,(l  <i<n): 

n 

(2) 

i  =  l 

To  compare  a  monotonic  multi-method  planner  with 
the  corresponding  single-method  planner,  we  need  to 
subtract  (1)  from  (2),  yielding 

$(Mk„,A)  -  ssiMk,  -»  Mk,  ..■  -*  Mk„,A)  = 

n  i  — 1 

YilQi^Ais{Mk„,Bk,)-s(Mk„Bk,))-'^f(Mk,,Bk,))]. 

i=i  j=i 

This  means  that  if  the  performance  gain  by  using  a 
cheaper  method  (s(A/t„ ,  Bk,)  —  s{Mk, ,  Bk,))  is  greater 
than  the  wasted  time  by  using  inappropriate  meth¬ 
ods  (J2;=i  fi^kj,  Bk,))  in  a  monotonic  multi-method 
planner,  then  it  is  preferable  to  use  that  method 
over  the  single-method  planner;  otherwise,  the  single¬ 
method  planner  is  preferred  (at  least  where  planning 
time  is  concerned). 

Plan  length:  The  estimated  plan  length  Ie  for  a 
complete  monotonic  multi-method  planner  is: 

lB{Mk,-*Mk,^...-*Mk.,A)  = 

f2[Qk.*l{Mk„Bk,)], 

i=l 

while  the  plan  length  for  the  corresponding  single¬ 
method  planner  Mk„  is 

n 

‘iMk,„A)  =  Yi[Qk,*l{Mk„Bk,)]. 

i  =  l 

If  Mk,  — ►  Mk,  — ►  ...  — *  Mk,  is  monotonic,  then 
l{Mk,,Bk,)  <  l(Mk,,Bk,)  Therefore  the  lengths  of 
plans  generated  from  a  monotonic  multi-method  plan¬ 
ner  are  less  than  or  equal  to  the  length  of  plans  gener¬ 
ated  from  the  corresponding  single-method  planner. 
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Average 
number  of 

decisions 

Average 
plan  length 

Planning  type 

expected 

actual 

expected 

actual 

Sfl  ~  .V/s 

39  28 

35  31 

3  35 

3  27 

Ml  ~  Me 

•?i5  19 

42  16 

3  34 

3  32 

M2  -  Ms 

35  33 

35  26 

2.67 

2.63 

Mi  -  .V/6 

37  71 

37,61 

2  72 

2  81 

Mi  -  Ms 

33  42 

33  47 

2.48 

2.44 

Mi  ~  Me 

35.50 

34  61 

2  65 

2  6 

Ml  Mi  —  Ms 

38  52 

38,00 

2  66 

2.76 

A/,  _  Mi  -  Me 

40.90 

38  31 

2.70 

2.82 

Ml  -  .1/3  -  Ms 

35.99 

34  09 

2  43 

2  42 

Ml  -  .\/3  -  Me 

38.07 

35  95 

2.60 

2.57 

Ms 

- 

39.94 

- 

4  57 

Me 

- 

52  61 

- 

4  36 

Table  2'  Expected  and  actual  (experimental)  perfor¬ 
mance  for  the  monotonic  multi-method  planners 
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Experimental  Results 

Equation  (1)  makes  it  possible  to  predict  the  perfor¬ 
mance  of  monotonic  multi-method  planners  on  some 
problem  sets.  Since  A/s  and  A/g  are  the  two  com¬ 
plete  single-method  planners  in  this  domain,  ten  com¬ 
plete  monotonic  multi-method  planners  can  be  created 
from  the  restricted  dominance  graph  in  Figure  3:  six  2- 
method  planners  and  four  3-method  planners.  Table  2 
shows  the  average  expected  number  of  decisions  and 
the  average  expected  plan  length  for  these  ten  plan¬ 
ners. 

In  order  to  validate  the  predictions  of  the  model,  we 
have  implemented  these  ten  planners  in  Soar.  Each 
single-method  planner  in  a  monotonic  multi-method 
planner  was  implemented  as  a  specialization  of  a  gen¬ 
eral  problem-space.  Based  on  the  sequence  of  single¬ 
method  planners,  a  set  of  meta-level  control  rules  was 
provided  to  coordinate  which  problem-space  is  tried 
next  if  the  current  problem-space  does  not  generate 
a  plan  for  the  given  problem.  Three  repetitions  were 
made  for  each  problem  in  the  same  100  problem  set 
used  in  Figure  1.  Within-trial  learning  was  turned  on 
for  each  problem  as  in  the  experiments  with  the  single¬ 
method  planners,  but  learned  rules  were  also  allowed 
to  transfer  from  an  ezulier  method  to  a  later  method 
(for  the  same  problem).  This  is  equivalent  to  the  type 
of  transfer  allowed  in  the  single-method  planners. 

The  experimental  results  for  decision  cycles  and  plan 
length  are  presented  in  Table  2.  It  shows  that  the 
experimental  decisions  are  slightly  less  than  the  cor¬ 
responding  expected  decisions,  while  the  experimental 
plan  lengths  are  quite  comparable  to  the  expected  plan 
lengths.  The  difference  between  expected  decisions  and 
experimental  decisions  is  probably  due  to  the  across- 
method  transfer  of  learned  rules;  that  is,  if  two  meth¬ 
ods  within  a  monotonic  multi-method  planner  have  the 
same  bias  and  the  control  rules  learned  from  the  earlier 
method  depend  on  only  that  bias,  then  those  rules  can 
transfer  to  the  later  method.  If  this  is  the  case,  then 
across  method  transfer  is  saving  about  10%. 

Figure  4  compares  the  ten  monotonic  multi-method 
planners  and  the  two  single-method  planners  based  on 
the  data  in  Table  2.  It  shows  that  nine  out  of  ten  multi- 


Figure  4;  Monotonic  multi-method  vs.  single-method 
planners  (“o”  represents  single-method  planners,  “-I-” 
represents  2-method  planners,  and  represents  3- 
method  planners.) 


method  planners  (Mi  — «  A/s  is  the  exception)  outper¬ 
form  the  two  single-method  planners  in  terms  of  both 
number  of  decisions  and  plan  length  for  this  set  of  bi¬ 
ases  and  problems.  Among  the  multi-method  planners, 
3-method  planners  tend  to  generate  marginally  shorter 
plans  than  2-method  planners,  with  a  slightly  increased 
number  of  decisions  (except  the  case  of  M\  — •  A/s). 


Related  Work 

The  basic  approach  of  bias  relaxation  in  multi-method 
planning  is  similar  to  the  shift  of  bias  for  inductive  con¬ 
cept  learning  (Russell  &  Grosof,  1987;  Utgoff,  1986). 

In  the  planning  literature,  this  aoproach  is  closely  re-  ® 

lated  to  an  ordering  modificatioit  which  is  a  control 
strategy  to  prefer  exploring  some  plans  before  oth¬ 
ers  (Gratch  k  DeJong,  1990).  Bhatnagar  k  Mostow 
(1990)  described  a  relaxation  mechanism  for  over¬ 
general  censors  in  FAILSAFE-2.  Ho\.’ever,  there  are  a 
number  of  differences,  such  as  the  type  of  constraints  ^ 

used,  the  granularity  at  which  censors  are  relaxed,  and 
the  way  censors  are  relaxed.  SteppingStone  (Ruby  k 
Kibler,  1991)  tries  constrained  search  first,  and  moves 
on  to  unconstrained  search,  if  the  constrained  search 
reaches  an  impasse  (within  the  boundary  of  ordered 
subgoals)  and  the  knowledge  stored  in  memory  cannot 
resolve  the  impeisse.  0 


Conclusions 

In  this  paper,  we  investigated  a  sequential  multi¬ 
method  planning  approach  that  can  improve  the  three 
planning  criteria:  planner  completeness,  planning  effi¬ 
ciency,  and  plan  length.  A  formal  analysis  shows  that 
(1)  a  monotonic  multi-method  planner  takes  less  plan¬ 
ning  time  then  the  corresponding  single-method  plan¬ 
ner,  if  the  performance  gain  by  using  a  cheaper  method 
is  greater  than  the  wasted  time  by  using  inappropri¬ 
ate  methods  in  the  monotonic  multi-method  planner; 


and  (2)  the  lengths  of  plans  generated  from  a  mono¬ 
tonic  inulti-metliod  planner  are  less  than  or  equal  to 
the  length  of  plans  generated  from  the  corresponding 
single-method  planner.  The  experimental  results  ob¬ 
tained  so  far  in  the  blocks  world  are  consonant  with 
this  model  (though  there  is  a  small  confound  due  to 
learning). 

The  findings  in  this  paper  do  not  necessarily  mean 
that,  for  all  situations,  there  exists  a  monotonic  multi¬ 
method  planner  which  outperforms  the  most  efficient 
single-method  planner.  In  fact,  the  performance  of 
these  planners  depends  on  the  biases  used  in  the  multi- 
method  planners  and  thf  problem  set  used  in  the  ex¬ 
periments.  For  example,  if  the  problems  are  so  complex 
that  most  of  the  problems  are  solvable  only  by  the  least 
restricted  method,  the  performance  loss  by  trying  inap¬ 
propriate  earlier  methods  in  sequential  multi-method 
planners  would  be  critical.  On  the  other  hand,  if  the 
problems  are  so  trivial  that  it  take:  only  a  few  deci¬ 
sions  for  the  least  restricted  method  to  solve  the  prob¬ 
lems,  the  slight  performance  gain  by  using  more  re¬ 
stricted  methods  in  sequential  multi-method  planners 
might  be  overridden  by  the  complexity  of  the  meta¬ 
level  processing  required  to  coordinate  the  sequence  of 
primitive  planners. 

It  thus  remains  an  open  question  as  to  the  range  of 
situations  in  which  multi-method  planners  will  actu¬ 
ally  perform  better.  However,  one  way  to  increase  the 
chances  of  multi-method  planner’s  performing  better  is 
to  take  advantage  of  the  novel  optimization  opportuni¬ 
ties  that  they  provide.  One  possibility  is  to  extend  the 
scope  of  the  biases  to  ones  that  limit  the  size  of  the  goal 
hierarchy  to  reduce  the  search  space,  limit  the  length  of 
plans  generated  to  shorten  execution  time,  and  result 
in  learning  more  effective  rules  to  increase  transfer  (Et- 
zioni,  1990).  Another  possibility  is  to  learn  which  plan¬ 
ning  method  to  use  for  which  class  of  problems.  This 
can  reduce  the  time  wasted  by  multi-method  planners. 
Although  our  current  implementation  has  the  capa¬ 
bility  to  learn  rules  that  select  appropriate  methods, 
the  effect  of  method  selection  on  the  overall  system 
performance  has  not  yet  been  fully  investigated.  The 
third  possibility  is  to  reduce  the  granularity  at  which 
the  individual  planning  methods  are  selected  and  used. 
This  means  that  a  planning  method  can  be  switched 
at  a  subgoal  selection  point  or  an  cor,,  .vtor  selection 
point,  if  there  is  no  path  from  that  point  to  reach  the 
goal  state.  This  approach  can  potentially  improve  the 
performance  of  multi-method  planning  if,  for  example, 
there  are  a  significant  number  of  problems  where  most 
of  the  subgoals  are  solvable  by  a  very  cheap  method 
while  the  remainder  of  the  problem  requires  a  more 
complex  method. 
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